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Abstract 

We discuss how to obtain an implicit description of the closure of a 
discrete exponential family with a finite set of equations derived from an 
underlying oriented matroid. These equations are similar to the equations 
used in algebraic statistics, although they need not be polynomial in the 
general case. This framework allows us to study the possible support sets 
of an exponential families with the help of oriented matroid theory. In 
particular, if two exponential families induce the same oriented matroid, 
then they have the same support sets. 



1 Introduction 

In this paper wc study exponential families, which are weU known statistical 
models with many nice properties. Let £ be an exponential family on a finite 
set X, and S its closure. We want to describe the set 

S := {supp(P) (ZX -.PeE). (1) 

of all possible support sets occurring in £. 

The problem of determining the possible support sets in an exponential fam- 
ily is a classical problem in statistics. It amounts to describing the boundary of 
the most basic statistical models. This problem is related to characterizing the 
marginal polytope, which can be used, for example, to study the existence or 
non- existence of the MLE [EFRS06j . One can show that computing the support 
sets of any exponential family is of the same complexity class as NP hard combi- 
natorial problems such as the problem of finding maximal cuts in graphs, since 
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it is known that the class of marginal polytopes contains the so-called cut poly- 
topes (see jKWA09| ). This means that there is no corresponding fast algorithm, 
unless NP = co-NP |DL97| . Nevertheless, considering only certain subclasses 
of exponential families, the situation may simplify so that explicit statements 
about support sets become possible. For instance, one of the authors discusses 
support sets of small cardinality in hierarchical models, a particular kind of 
exponential families [KahlOj . In this paper we find a concise characterization 
of the support sets in general exponential families with the help of oriented 
matroids. We hope that this will allow for further theoretical results in this 
direction. 

Although slightly hidden, the connection to oriented matroid theory is very 
natural. The starting point, and another focus of the presentation, is the implicit 
description of exponential families for discrete random variables inspired by so 
called Markov bases |GMS06| . It is described in Theorem [H We study the — 
not necessarily polynomial — equations that define the closure of the exponential 
family and relate them to the oriented matroid of the sufficient statistics of the 
model. In the case of a rational valued sufficient statistics, our observations 
reduce to the fact that the non-negative real part of a toric variety is described 
by a circuit ideal. We emphasize how the proof of this fact uses arguments from 
oriented matroid theory. 

This paper is organized as follows. In Section [2] we develop a theory of im- 
plicit representations of exponential families which is analogue to and inspired 
by algebraic statistics [GMS06| . In contrast to the toric case we do not require 
the sufficient statistics to take integer values and thereby leave the realm of 
commutative algebra. What remains is the theory of oriented matroids. We 
discuss how answers to the support set problem look like in the language of 
oriented matroids and discuss examples coming from cyclic polytopes. These 
polytopes are well known in combinatorial convexity for their extremal prop- 
erties, as stated, for instance, in the Upper Bound Theorem. In Section [3] we 
discuss the basics of the theory of oriented matroids and reformulate statements 
from Section [2] in this language, making the connection as clear as possible. 

2 Exponential families 

We assume a finite set X {!,..., m) and denote V{X) the open simplex of 
probability measures with full support on X. The closure of any set M C R'*, 
in the standard topology of R", is denoted by M . Any vector n S M'* can 
be decomposed into its positive and negative part n — — n~ via ri^ix) := 
max(n(x),0) and n~{x) := max(— n(a;), 0). For any two vectors n,p G M"^ we 
define 

■.^l[pixr^''\ (2) 

xex 

whenever this product is well defined (e.g. when n and p are both non- negative) . 

Let g be a positive measure on X with full support, and let A G M'^^™ be a 
matrix of width m. We denote Ux, x € X, the columns of A. Then we have 
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Definition 1. The exponential family associated to the reference measure q 
and the matrix A is the set of probabihty measures 

£q,A := |pe e V{X) : pg{x) = ^ cxp {6^ a.,) , G M''| , (3) 

where Zg :~ '^^ex ^i-^) '•^^P o-x) ensures normalization. 

If q{x) — 1 for all x G A", i.e. if q is the uniform measure on X , then the 
corresponding exponential family is abbreviated with £a- 

In the following we always assume that the matrix A has the vector (1, . . . , 1) 
in its row span. This means that there exists a dual vector /i G (R'')* which 
satisfies li {ax) = 1 for all x ^ X . There is no loss of generality in this assumption 
as we can always add an additional row (1, . . . , 1) to A without changing the 
exponential family. 

Remark 2. The exponential family depends on A only through its row span L. 
Different matrices with the same row span lead to different parametrizations of 
the same exponential family. In the following it will be convenient to fix one 
parametrization, hence we work with matrices A instead of vector spaces C. 

The geometrical structure of the boundary of Sq^A is encoded in the polytope 
of possible values that the map A: V{X) R'^, x ^ Ax takes: 

Definition 3. The convex support of Eq^A is the polytope 

cs{£q^A) ■= convja^: : X & X} . (4) 

In the context of hierarchical models, the convex support is also called 
marginal polytope. 

We will see later that the faces of cs(fq.^) are in a one-to-one correspondence 
with the different support sets occurring in £q,A- Even more is true: The 
mapping A, restricted to £q^A, defines a homeomorphism £q,A — cs{£q^A) which 
maps every probability measure p G £q.A into the face corresponding to its 
support, see for example [BN78J . This homeomorphism is called the moment 
map. One can use the properties of the moment map to prove Theorem II 5 1 using 
arguments from the theory of oriented matroids. This will be discussed in the 
next section. 

Note that the parametrization in ([3]) does not extend to the boundary. This 
is one of the motivations to move on to an implicit description of the exponential 
family. The next theorem shows how to obtain an implicit description from £q,A 
from the kernel of A. This gives a nice "duality" as the parametrization itself 
is derived from the image of A. 

Theorem 4. A distribution p is an element of the closure of £q^A if o,nd only 
if all the equations 

p^'^q"^' for all n G kerA, (5) 



hold for p. 
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Remark 5. This theorem is a direct generahzation of Theorem 3.2 in [GMS06| . 
There only the polynomial equations among ([5]) are studied under the additional 
assumption that A has only integer entries. Moreover, only the uniform reference 
measure was considered. However, the proof of the theorem generalizes without 
any major problem. Actually, the proof of our theorem here needs one step less, 
since we don't need to show the reduction to the polynomial equations. The 
different flavor of the results will be made more precise in Remark II 31 later . 

Our proof closely follows |GMS06| . In our presentation of the proof we want 
to explicitly point out how matroid-type arguments are used, the first example 
being Lemma [71 

Before giving the proof of Theorem [4] we first state a couple of auxiliary 
results which arc of independent interest. The matrix A and derived objects are 
fixed for the rest of the considerations. A face of a polytopc P is the intersection 
of the polytopc with an affine hyperplane H, such that all a; G P with x ^ H 
lie on one side of the hyperplane. Faces of maximal dimension are called facets. 
It is a fundamental result that every polytope can equivalently be described as 
a compact set defined by finitely many inequalities (i.e. facets), see |Zie94j . 

In particular we are interested in the face structure of cs{£q^A)- Since we 
assumed that all columns of A lie in the affine hyperplane li = 1, wc can 
replace every affine hyperplane H by an equivalent central hyperplane (which 
passes through the origin). This motivates the following 

Definition 6. Let {qx ■ x e X} be the vertex set of a polytope. A set F C A" 
is called facial if there exists a vector c G R"* such that 



Lemma 7. Fix a matrix A = {ax)x£X G R'^^™ and a nonempty subset F fZ X. 
Then we have: 

• If F is facial then no non-zero non-negative linear combination of the Ox, 
X ^ F, can be written as linear combination of the Ox, x G F. 

• F is facial if and only if for any u G ker A.' 



• If p is a solution to (O, then supp(p) is facial. 

Proof. For the first statement, assume to the contrary that wc can find a{x) > 
and /3(x) not all zero such that u ~ ^xi^F '^{^)'^x = X^^eF P{^)0'x, and let c be 
as in ([6]). We have 



whence = for all i ^ F. This also proves the first direction of the second 
statement. 



(6) 



supp(u+) C F ^ supp(m ) C F. 



(7) 
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The opposite direction is a bit more complicated and uses Farkas' Lemma 
(see for example |Zie94| ): Let B S K'^'^, and z S R'. Either there exists a point 
in the polyhedron {x : Bx < z}, or there exists a non-negative vector y £ M> 
with B = and y'^ z < 0, but not both. Assume that F C. X is nonempty 
and satisfies (O for all u g kei A. Let B be the + m) x d matrix with 
rows {a!^ : x G F}, {— a!^ : x g F}, {— a^,x ^ F}, and z be the vector which 
has entries zero in the first 2\F\ components and entries — f in the last to — |F|. 
Then a solution to Bx < z provides a facial vector. Thus it remains to show that 
each non-negative y = (y'^-*, y'^', y^^-*)^, decomposed according to the rows of 
B, with y^B ~ satisfies y^z > 0. Assume that the columns of A arc ordered 
such that the columns with indices x € F come first. Then y^^^ must be zero as 
otherwise (y*^^^ — y^^\y^^^)'^ & keiA would violate ([7]) by non-negativity of y. 
But then y^z = trivially. 

The last statement follows immediately from the second statement. □ 

Now we are ready for the proof of Theorem |4l 

Proof of Theorem The first thing to note is that it is enough to prove the 
theorem when q{x) = 1 for all x. To see this note that p G £a if and only if 
Xqp G £q,A, where A > is a normalizing constant, which does not appear in 
equations ([5|) since they are homogeneous. 

Denote Za the set of solutions of We first show that £a satisfies the 
equations defining Za- We plug in the parametrization to find 

p" = p(2;)«(^) = ^(^e'^a^y^''^ ^ -Q ^e{x){Au){x} ^ "Q ^0{x){Av}{x) ^ ^ 

xex x£X xex xex 

Thus £a Q Za, and also £a ^ Za = Za- 

Next, let p £ Za\£a- We construct a sequence in £a that converges to 
p as /i — > — oo. 

Consider the following system of equations in variables d = {di, . . . ,dn)'- 

cF ax = logp(a;) for all x G supp(p). (9) 

We claim that this linear system has a solution. Otherwise we can find numbers 
v{x), x G F, such that J2x "(•'^) ^'^SPi^) ^^^^ J^x '"{^)'^x = 0. This leads to 
the contradiction p"^ ^ p"" . 

Fix a vector c G M"^ with property ^ and for any /.j G M define 

■■=P,c+d = (e^^"'^^c^"°\ . . . ,e^^""-e'^"'^") G £a- 

By ^ it is clear that lim^^_oo P(^i) = P- This proves the theorem. □ 

Wc now sec that the last statement of Lemma [7] can be generalized |GMS06t 
Lemma A. 2]: 

Proposition 8. The following are equivalent for any set F C X : 
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1. F is facial. 



2. The uniform distribution jp^l^p-j of F lies in £a- 



3. There is a vector with support F in E a- 

According to Theorem |4l in order test whether p is an element of the closure 
of £q^A^ we have to test all the equations ([5]). The next theorem shows that 
it is actually enough to check finitely many equations. For this, we need the 
following notion from matroid theory: A circuit vector of a matrix A is a nonzero 
vector n G M™ corresponding to a linear dependency n{x)ax with inclusion 
minimal support, i.e if n! G R™ satisfies supp(n') C supp(n), then r\! = \n 
for some A G M. Equivalently, n is an element of ker A with inclusion minimal 
support. 

A circuit is the support set of a circuit vector. The minimality condition 
implies that the circuit determines its corresponding circuit vectors up to a 
multiple. A circuit basis C contains one circuit vector for every circuitu 

If we replace n by a nonzero multiple of n then equation ([5]) is replaced by 
an equation which is equivalent over the non-negative reals. This means that 
all systems of equations corresponding to any circuit basis C are equivalent. 

Theorem 9. Let £q^A be an exponential family. Then Sq^A equals the set of all 
probability distributions that satisfy 



where C is a circuit basis of A. 

The proof is based on the following two lemmas: 

Lemma 10. For every vector n G ker A there exists a sign- consistent circuit 
vector c G ker A, i.e. if c{x) ^ then sgnc(a;) = sgnn{x) for all x £ X. 

Proof. Let c be a vector with inclusion-minimal support which is sign-consistent 
with n and satisfies supp(c) C supp(n). If c is not a circuit, then there exists a 
circuit c' with supp(c') C supp(c). Using a suitable linear combination c + ac' , 
a G M, we can obtain a contradiction to the minimality of c. □ 

Lemma 11. Every vector n G ker A is a finite sign- consistent sum of circuit 
vectors n = X]i=i '^i' */ Ci(a;) ^ then sgnci(a;) ~ sgi'in(x) for all x £ X . 

Proof. Use induction on the size of supp(n). In the induction step, use a sign- 
consistent circuit, as in the last lemma, to reduce the support. □ 

Proof of Theorem\^ Again, we can assume that q{x) ~ 1 for all x £ X. By 
Theorem U it suffices to show: If p G M"^ satisfies , then it also satisfies 
p" _ pTi Jqj. ^ g j^jjj. Write n = X]i=i ^ ^ sign-consistent sum of 




p'^ cf for all c G C, 



(10) 



^It is easy to see that a circuit basis of ker A spans ker A. However, in general the circuit 
vectors are not Unearly independent. 
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circuits Ci, as in the last lemma. Without loss of generality we can assume 
Ci £ C for all i. Then n+ = X]I=i ^^^^ ^ '^r- H^nce p satisfies 

p«+ - p"" = p^rJi' - p""" ) + (p^'= ' - p^-i ) , (11) 

so the theorem follows easily by induction. □ 

The theorem implies that a finite number of equations is sufficient to de- 
scribe £q^A- The number of equations that are necessary is bounded from above 
by the number of different support sets occurring in C. 

Example 12. Consider the following sufficient statistics: 

-^(-. ; ; <-> 

where a ^ {0, 1} is arbitrary. The kernel is then spanned by 

vi — (1, a, —1, — a)"^ and V2 — (1, a, —a, —1)"^. (13) 
These two generators correspond to the two relations 

p{l)p{2r = P(3)p(4)", and p(l)p(2)" = p(3)>(4). (14) 
It follows immediately that 

p(3)p(4)"=p(3)X4). (15) 

If p(3)p(4) is not zero, then we conclude p(3) = p(4). However, on the boundary 
this does not follow from equations Possible solutions to these equations 

are given by 

Pa = (0,a,0,l -a) for < a < 1. (16) 

However, pa does not lie in the closure of the exponential family £ai since all 
members of £a do satisfy p(3) = p(4). 

A circuit basis of A is given by the following vectors: 

(0,0,1,-1)^ p(3)=p(4), (17a) 

(l,a,0,-l-a)^ p(l)p(2)"=p(4)i+", (17b) 

(l,a,-l-a,0)^ p(l)p(2)"=p(3)i+". (17c) 

Remark 13 (Relation to algebraic statistics). In the particular case where the 
vector space ker A has a basis with integer components (for example, if A itself 
has only integer entries), every circuit is proportional to a circuit with integer 
components. In this case the corresponding equations ([5]) are polynomial, and 
the theorem implies that £a is the non-negative real part of a projeetive variety, 
i.e. the solution set of homogeneous polynomials. If we want to use the tools of 
commutative algebra and algebraic geometry, then it turns out that circuits are 
not the right object to consider: For example, proportional circuits only yield 
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equivalent equations if we consider them over the non-negative reals, but we 
may obtain a different solution set if we allow negative real solutions or complex 
solutions, which may greatly increase the running time of many algorithms of 
computational commutative algebra. Hence, if we want to use algebraic tools, 
it is best to work with a Markov basis, which can be defined as a finite set of 
kernel vectors such that the solution set over C of the corresponding equations 
equals the Zariski closure of £, i.e. the smallest variety containing £□ In this 
algebraic setting, Theorem |4] remains valid if we replace "closure" by "Zariski 
closure" and ker A by the integer kernel kerz A. This fact was first noted in 
[DS98] . 

In the algebraic case one can also look at the ideal (see jCLOOSj ) generated 
by all polynomial equations induced by integer valued circuit vectors. This 
ideal is called the circuit ideal. By what was said above this ideal is in general 
smaller than the associated toric ideal, which contains the polynomial equations 
induced by all integer valued kernel vectors. Circuit ideals have been studied 
already in the seminal paper [ES96| . For further results illuminating their nice 
relations to polyhedral geometry we refer to [BJT07| . 

Finding a Markov basis is in general a non-trivial task, see |HM09| . It seems 
to be much easier to compute the circuits of a matrix. However, a minimal 
Markov basis is usually much smaller than a circuit basis, and thus it is easier 
to handle (but cf. the next remark). For experiments in this direction we recom- 
mend the open source software package 4ti2 |4ti2j which can compute circuits 
as well as Markov bases. 

Remark 14. Using arguments from matroid theory the number of circuits can 
be shown to be less or equal than (^^2)' where m = \X\ is the size of the state 
space and r is the dimension of the exponential family £q,A, see |DSL04j . This 
gives us an upper bound on the number of implicit equations which is necessary 
to describe Sq.A- Note that (^^2) ^® usually much larger than the codimension 
m — r — 1 of Sq^A in the probability simplex. In contrast to this, if we only want 
to find an implicit description of all probability distributions of Sq^A-, which 
have full support, then m — r — 1 equations are enough: We can test p G £q,A 
by checking whether \og{p/q) lies in the column span of A. This amounts to 
checking whether \og(j>/q) is orthogonal to ker A, which is equivalent to m — r — 1 
equations, once we have chosen a basis of ker A. 

It turns out that even in the boundary the number of equations can be fur- 
ther reduced: In general we do not need all circuits for the implicit description 
of Sq^A- For instance, in Example [121 the equations Il7bl and I17cl are equiv- 
alent given [T7a( i.e. we only need two of the three circuits to describe Sq,A- 
Unfortunately we do not know how to find a minimal subset of circuits that 
characterizes the closure of the exponential family. Of course, in the algebraic 
case discussed in the previous remark this question is equivalent to determining 
a minimal generating set of the circuit ideal among the circuits. 

^It turns out that it is not so easy to find an example of a Markov basis which does not 
consists of circuits. In IAT03I . S. Aoki and A. Takemura give a model and a Markov basis 
element which is not a circuit. Interestingly, the full Markov basis of this model is not known. 
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Now we focus on the following problem: Given a set C A", is there a 
probability distribution p £a satisfying supp(p) = S7 In other words, we 
want to characterize the set 

Siq, A) := {supp(p) : p € £,,a} C 2^. (18) 

Proposition [8] gives the following characterization: A nonempty set 5 C A" is 
the support set of some distribution p G £q,A if and only if the following holds 
for all circuit vectors n G ker A: 

• supp(n+) C S* if and only if supp(n^) C S. 

Obviously, this condition docs not depend on the circuits themselves, but only 
on the supports of their positive and negative part. In order to formalize this 
observation, consider the map 

sgn: n t-^ (supp(7'i^), supp(n^)), 

which associates to each vector a pair of disjoint subsets of X. Such a pair of dis- 
joint subsets shall be called a signed subset of X in the following. Alternatively, 
signed subsets {A, B) can also be represented as sign vectors X G { — 1,0, +1}'^, 
where 

{+1, ifxG^, 
-1, ifxGB, (19) 
0, else. 

In this representation, sgn corresponds to the usual signum mapping extended 
to vectors. As a slight abuse of notation, we don't make a difference between 
these two representations in the following. 

The signed subset sgn(c) corresponding to a circuit c G ker A shall be called 
an oriented circuit. The set of all oriented circuits is denoted by 

C{A) := ± sgn(C) = {sgn(c) : c G C or c G -C}, (20) 

where C is a circuit basis of A. 

Wc immediately have the following 

Theorem 15. Let S be a nonempty subset of X . Then S € S if and only if the 
following holds for all signed circuits {A, B) G C{A): 

ACS ^ BCS. (21) 

Corollary 16. // two matrices Ai, A2 satisfy C{Ai) = C(^2) then the possible 
support sets of the corresponding exponential families Sq-^^Ai and£q.-^,A2 coincide. 

According to remark [TU Theorem [T51 gives us up to (/^j) conditions on the 
support. Usually, some of these conditions are redundant, but it is not easy 
to see a priori, which conditions are essential. Of course, a necessary condition 
for a subset S" of A" to be a support set of a distribution contained in £ a is 
condition pT|) restricted to pairs from a subset TL C C{A). For example, one 
can take Ti := sgn(_B), where B is a finite subset of ker A, such as a basis. 
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Example 17. Let's continue Example Wli From the circuits we deduce the fol- 
lowing implications: 

p(3)^0 ^ M4)^0, (22a) 
p(l) ^ and p{2) ^0 ^ p(4) ^ 0, (22b) 
p(l) ^ and p{2) ^0 ^ p{3) ^ 0. (22c) 

Again, as above, the last two implications are equivalent given the first. 

From this it follows easily that the possible support sets in this example 
are {1}, {2} and {1,2,3,4}. From the spanning set (fT3|) we only obtain the 
implication 

p(l) ^ and p{2) ^0 ^ p(3) ^ and p(4) ^ 0. (23) 

We conclude this section with two examples where a complete characteriza- 
tion of the face lattice of the convex support and thus of the possible supports 
is easily achievable. 

Example 18 (Supports in the binary no-n-way interaction model). Consider the 
binary hierarchical model |KWA09| whose simplicial complex is the boundary of 
an n simplex. If n = 3, this model is called the no-3-way interaction model and 
its Markov bases have been recognized to be arbitrarily complicated |LO06j , so 
we cannot hope to find an easy description of the oriented circuits. However, 
if we restrict ourselves to binary variables x = {xi)f^i £ X {0,1}", the 
structure is very simple. In this case the exponential family is of dimension 
2" — 2, i.e. of codimension 1 in the simplex, so ker^ is one dimensional. It is 
spanned by the "parity function" : 

eN(-):=j:' if EL. is odd, ^^^^ 
II otherwise. 

Using Theorem[T2]we can easily describe the face lattice of the marginal polytope 
I.e. convex support) p(»-i): A set 3^ C {0, 1}" is a support set if and only if 
it does not contain all configurations with even parity, or all configurations 
with odd parity. It follows that p("~i) is neighborly, i.e. the convex hull of any 

^ dim(p^ )j _ 2"~i — 1 vertices is a face of the polytope. To sec this, note 

that no set of cardinality less than 2"~^ can contain all configurations with even 
or odd parity. We can easily count the support sets by counting the non-faces 
of the corresponding marginal polytope, i.e. all sets y that contain either the 
configurations with even parity, or the configurations with odd parity. Let Sk 
be the number of support sets of cardinality of fc, i.e. the number of faces with 
k vertices. It is given by: 



2"\ / 2 



where (™) = if Z < 0. Since this polytope has only one affine dependency 
which includes all the vertices, we see that it is simplicial, i.e. all its faces 
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are simplices. It follows that fk, the number of fc-dimensional faces, is given by 

fk = Sk-l- 

Altogether we have determined the face lattice of the polytope, which means 
that we know the "combinatorial type" of the polytope. It turns out that the 
face lattice of p("~i) is isomorphic to the face lattice of the (2" — 2)-dimensional 
cyclic polytope with 2" vertices. 

Next, we take a closer look at cyclic polytopes. Define the moment curve in 
R'' by 

ajiR^R'', t^ x{t) {t,t^,--- ,t'^)^ . (26) 

The d- dimensional cyclic polytope with n vertices is 

C(d,n) :=conv{a;(ti),...,a;(t„)}, (27) 

the convex hull oi n > d distinct points (ti < t2 < . . . < tn) on the moment 
curve. The face lattice of a cyclic polytope can easily be described using Gale 's 
evenness condition, see |Zie94j . The cyclic polytope is simplicial and neighborly, 
i.e. the convex hull of any [|J vertices is a face of C(n, d), but even better, one 
has 

Theorem 19 (Upper Bound Theorem). If P is a d-dimensional polytope with 
n = fo vertices, then for every k it has at most as many k-dimensional faces as 
the cyclic polytope C{d,n): 

fk{P) < fk{C{d,n)), fc = 0,...,d. (28) 

// equality holds for some k with [|j < k < d then P is neighborly. 

Theorem [T9l was conjectured by Motzkin in 1957 and its proof has a long 
and complicated history. The final result is due to McMuUen |McM70j . 

The Upper Bound Theorem shows that the exponential families constructed 
above have the largest number of support sets among all exponential families 
with the same dimension and the same number of vertices. Finally, we consider 
a cyclic polytope of dimension two which also gives an interesting exponential 
family, answering the question for the exponential family of smallest dimension 
containing all the vertices of the probability simplex. The construction is due 
to |MA04j . 

Example 20. Let X = {\, . . . , to} and consider the matrix A, whose columns are 
the points on the 2-dimensional moment curve, augmented with row (1, . . . , 1): 



1 


1 


1 .. 






2 


3 .. 


in 




4 


9 .. 


TO^ 



(29) 



This matrix defines a two-dimensional exponential family. To approximate 
an arbitrary extreme point 5j of the probability simplex, consider the pa- 
rameter vector 9 = (j^, 1)-^, giving rise to probability measures ppg = 
^ exp(— /36'^A). Since 9'^ Ai ~ {i — j)^, we get that lim^_ooP;3e = 5j. 

Summarizing we see that cyclic polytopes, owing to their extremal proper- 
ties, have something to offer not only for convex geometry, but also for statistics. 
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3 Relations to Oriented Matroids 

In this section the results from the previous section are related to the theory of 
oriented matroids. The proofs in this section are only sketched, since the main 
results of this wo rk have already been proved directly. We refer to chapters 1 
to 3 of [BVS+93) for a more detailed introduction to oriented matroids. 

Let E he a finite set and C a non-empty collection of signed subsets of E 
(see the previous section). For every signed set X = {X~^,X~) of E we let 
X := X^ U X^ denote the support of X. Furthermore, the opposite signed set 
is —X = {X~ , X~^). Then the pair {E,C) is called an oriented matroid if the 
following conditions are satisfied: 

(CI) C = -C, 

(C2) for all X,Y eC,ifXCY, then X = Y or X = -Y, (incomparabiUty) 

(C3) for all X,Y e C, X -Y, and e e X+ DY- there is a Z G C such that 
Z+ C {X+ U Y+) \ {e} and Z~ C {X- U F") \ {e}. {weak elimination) 

In this case each element of C is called a signed circuit. 

Note that to every oriented matroid {E, C) we have an associated unoriented 
matroid {E,C), called the underlying matroid, where 

C = {X+ U X- = supp{X) : X e C} (30) 

is the set of circuits of {E, C). In this way oriented matroids can be considered 
as ordinary matroids endowed with an additional structure, namely a circuit 
orientation which assigns two opposite signed circuits ±X G C to every circuit 
X&C. 

The most important example of an oriented matroid here is the oriented 
matroid of a matrix ^ C M''^™. In this case \ci E = X = {1, . . . , m}. and let 

C = {(supp(n^), supp(n~) : n G kcr^ has inclusion minimal support.}. (31) 

This example is so important that oriented matroids which arise in this way are 
given a name: An oriented matroid is called realizable if it is induced by some 
matrix 

The only axiom which is not trivially fulfilled for this example is (C3). How- 
ever, if we drop the minimality condition and let V = {(supp(n''"), supp(n~) : 
n G ker^}, then it is easy to see that V satisfies (C3). Thus {E,C) satisfies 
(C3) by the following proposition: 

Proposition 21. Let V be a nonempty collection of signed subsets of E satisfy- 
ing (CI) and (C3). Write Min(V) for the minimal elements ofV (with respect 
to inclusion of supports). Then 

1. for any X gV there is Y e Min(V) such that Y+ C X+ and Y' C X' . 



•^Note that this definition depends, in fact, only on the kernel of A, compare Rcmark[2] 
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2. Min(V) is the set of circuits of an oriented matroid. 



Proof |BVS+93) . proposition 3.2.4. 



□ 



This illustrates how (C2) corresponds to the minimality condition. It is 
possible to define oriented matroids without this minimality condition using the 
following construction: 

For two signed subsets X,Y of E define the composition of X and Y as 



{XoY)+ ■=X+U{Y+\X-), {X oY)- := X- \J{Y- \X+). (32) 



Note that this operation is associative but not commutative in general. A 
composition X oY is conformal if X and Y are sign- consistent, i.e. X^ n Y^ ~ 

= X- nY+. 

An o.m. vector of an oriented matroid is any composition of an arbitrary 
number of circuitsEI The set of o.m. vectors shall be denoted by V. If the 
oriented matroid comes from a matrix A, then V equals the set V from above. 

The above proposition implies easily that an oriented matroid can be defined 
as a pair {E, V), where V is a collection of signed subsets satisfying (CI), (C3) 
and 



Note that in the realizable case linear combinations of vectors correspond to 
composition of their sign vectors in the following sense: 



Now Lemmas [10] and [TT] correspond to the following two lemmas 

Lemma llOf . For every o.m. vector Y there exists a sign- consistent signed cir- 
cuit X such that X dY . 

Lemma lllf . Any o.m. vector is a conformal composition of circuits. 

To every matrix A we can associate a polytope which was called convex 
support in the last section. Many properties of this polytope can be translated 
into the language of oriented matroids. This yields constructions which also 
make sense, if the oriented matroid is not realizable. In order to make this 
more precise, we need the notion of the dual oriented matroid. The general 
construction of the dual of an oriented matroid is beyond the scope of this 
work. Here, we only state the definition for realizable oriented matroids. 

In the following we assume that the matrix A has the constant vector 
(1,...,1) in its rowspace. This means that all the column vectors ax lie in 
a hyperplane = 1. In the general case, this can always be achieved by adding 

4ln tBVS+93l . o.m. vectors are simply called vectors. The name "o.m. vector" has been 
proposed by F. Matiis to avoid confusion. 



(VO) e V, 



(V2) for all X, F e V we have XoY eV, 



sgn(n + en') = sgn(n) o sgn(n'), for e > small enough. 



(33) 
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another dimension. Technically we require that the face lattice of the polytope 
spanned by the columns of A is combinatorially equivalent to the face lattice of 
the cone over the columns. See also the remarks before Definition [51 

For every dual vector I e (R'*)* let := {x € X : l{a,j:) > 0} and iV,~ := 
{x £ X : l{ax) < 0}. This way we can associate a signed subset sgn*(Z) := 
{Nj^,Ni^) with /. The signed subset sgn*(Z) is called a covector. Let C be 
the set of all covectors. If the signed subset {Ni^,Ni~) has minimal support 
(i.e. "many" vectors lie on the hyperplane / = 0), then I is called a cocircuit 
vector^ and sgn*(Z) is called a signed cocircuit. The collection of all signed 
cocircuits shall be denoted by C* . 

Lemma 22. Let {E,C) be an oriented matroid induced by a matrix A. Then 
{E,C*) is an oriented matroid, called the dual oriented matroid. 

Proof. See section 3.4 of |BVS+93] . □ 

Note that the faces of the polytope correspond to hyperplanes such that all 
vertices lie on one side of this hyperplane, compare Definition [S] Thus the faces 
of the polytope are in a one-to-one relation with the positive covectors, i.e. the 
covectors X = {X'^jX') such that X~ = 0. The face lattice of the polytope 
can be reconstructed by partially ordering the positive covectors by inclusion 
of their supports; however, the relation needs to be inverted: Covectors with 
small support correspond to faces which contain many vertices. The empty 
face (which is induced, for example, by the dual vector li which defines the 
hyperplane containing all a^) corresponds to the covector T :~ (X,9). 

We can apply these remarks to all abstract oriented matroids such that 
T = {X, 0) is a covector. Such an oriented matroid is usually called acyclic. 
Thus a face of an acyclic oriented matroid is any positive covector. A vertex 
is a maximal positive covector X in C \ {T}, i.e. if ^ C y for some positive 
covector Y e C\ {X}, then Y = T. 

In this setting we have the following result, which clearly corresponds to the 
second statement of [T] 

Proposition 23 (Las Vcrgnas). Let {E,C) be an acyclic oriented matroid. For 
any subset F Q E the following are equivalent: 

• F is a face of the oriented matroid. 

• For every signed circuit X G C, if X^ C F then X^ C F. 

Proof. The proof of Proposition 9.1.2 in |BVS"'"93] applies (note that the state- 
ment of Proposition 9.1.2 includes an additional assumption which is never used 
in the proof). □ 

With the help of the moment map defined in the previous section, this 
proposition can be used to easily derive Theorem [151 By the properties of the 
moment map, every face of the convex support corresponds to a possible support 
set of an exponential family, and the proposition links this to the signed circuits 
of the corresponding oriented matroid. 

Finally, CoroUarv 1 1 61 can be rewritten as 
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Corollary I16f . The possible support sets of two exponential families coincide 
if they have the same oriented matroids. 

Unfortunately, this correspondence is not one-to-one: Different oriented ma- 
troids can yield the same face lattice, i.e. combinatorially equivalent polytopes. 
A simple example is given by a regular and a non-regular octahedron as de- 
scribed in |Zie94j . The special case has a name: an oriented matroid is rigid, if 
its positive covectors (i.e. its face lattice) determine all covcctors (i.e. the whole 
oriented matroid). Still, Corollarvll6f implies that the instruments of the theory 
of oriented matroids should suffice to describe the support sets of an exponential 
family. 

Remark 24 (Importance of Duality). There are mainly two reasons why the 
theory of oriented matroids (as well as the theory of ordinary matroids) is 
considered important. First, it yields an abstract framework which allows to 
describe a multitude of different combinatorial questions in a unified manner. 
This, of course, does not in itself lead to any new theorem. The second reason 
is that the theory provides the important tool of matroid duality. 

It turns out that the dual of a realizable matroid is again realizable: If A is 
a matrix representing an oriented matroid {E, C), then any matrix A* such that 
the rows of A* span the orthogonal complement of the row span of A represents 
the oriented matroid {E,C*). 

To motivate the importance of this construction we sketch its implications 
for the case that the oriented matroid comes from a polytope. In this case 
the duality is known under the name Gale transform [Zie94| Chapter 6]. A d- 
dimensional polytope with vertices can be represented by N vectors in M''+^ 
lying in a hyperplane. These vectors form a (d-l- 1) x TV-matrix A. Now we can 
find an {N — d—1) x TV-matrix A* as above, so the dual matroid is represented 
by a configuration of N vectors in R^"''"^. This means that this construction 
allows us to obtain a lowdimcnsional image of a highdimcnsional polytope, as 
long as the number of vertices is not much larger than the dimension. This 
method has been used for example in [Stu88| in order to construct polytopes 
with quite unintuitive properties, leading to the rejection of some conjectures. 
Furthermore, oriented matroid duality makes it possible to classify polytopes 
with "few vertices" by classifying vector configurations. 

The notion of dimension generalizes to arbitrary oriented matroids (and 
ordinary matroids). In the general setting one usually talks about the rank of 
a matroid, which is defined as the maximal cardinality of a subset E C F such 
that E contains no support of a signed circuit. In this sense duality exchanges 
examples of high rank and low rank, where "high" and "low" is relative to \E\. 
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